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^ ' Abstract 

Distributed source coding schemes are typically based on the use of channels codes as source 
codes. In this paper we propose a new paradigm, named "distributed arithmetic coding", which 
extends arithmetic codes to the distributed case employing sequential decoding aided by the side 
(j i information. In particular, we introduce a distributed binary arithmetic coder for the Slepian-Wolf 

coding problem, along with a joint decoder. The proposed scheme can be applied to two sources in 
j> ■ both the asymmetric mode, wherein one source acts as side information, and the symmetric mode, 



wherein both sources are coded with ambiguity, at any combination of achievable rates. Distributed 
arithmetic coding provides several advantages over existing Slepian-Wolf coders, especially good 



o 
p 

^vq performance at small block lengths, and the ability to incorporate arbitrary source models in the 

encoding process, e.g., context-based statistical models, in much the same way as a classical arith- 
metic coder. We have compared the performance of distributed arithmetic coding with turbo codes 
and low-density parity-check codes, and found that the proposed approach is very competitive. 
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Distributed Arithmetic Coding for the 
Slepian-Wolf problem 

I. Introduction and background 

In recent years, distributed source coding (DSC) has received an increasing attention from the 
signal processing community. DSC considers a situation in which two (or more) statistically dependent 
sources X and Y must be encoded by separate encoders that are not allowed to talk to each other. 
Performing separate lossless compression may seem less efficient than joint encoding. However, DSC 
theory proves that, under certain assumptions, separate encoding is optimal, provided that the sources 
are decoded jointly [1]. For example, with two sources it is possible to perform "standard" encoding 
of the first source (called side information) at a rate equal to its entropy, and "conditional" encoding 
of the second one at a rate lower than its entropy, with no information about the first source available 
at the second encoder; we refer to this as "asymmetric" Slepian-Wolf (S-W) problem. Alternatively, 
both sources can be encoded at a rate smaller than their respective entropy, and decoded jointly, which 
we refer to as "symmetric" S-W coding. 

DSC theory also encompasses lossy compression [2]; it has been shown that, under certain con- 
ditions, there is no performance loss in using DSC [2], [3], and that possible losses are bounded 
below 0.5 bit per sample (bps) for quadratic distortion metric [4]. In practice, lossy DSC is typically 
implemented using a quantizer followed by lossless DSC, while the decoder consists of the joint 
decoder followed by a joint dequantizer. Lossless and lossy DSC have several potential applications, 
e.g., coding for non co-located sources such as sensor networks, distributed video coding [5], [6], 
[7], [8], layered video coding [9], [10], error resilient video coding [11], and satellite image coding 
[12], [13], just to mention a few. The interested reader is referred to [14] for an excellent tutorial. 

Traditional entropy coding of an information source can be performed using one out of many 
available methods, the most popular being arithmetic coding (AC) and Huffman coding. "Conditional" 
(i.e., DSC) coders are typically implemented using channel codes, by representing the source using 
the syndrome or the parity bits of a suitable channel code of given rate. The syndrome identifies 
sets of codewords ("cosets") with maximum distance properties, so that decoding an ambiguous 
description of a source at a rate less than its entropy (given the side information) incurs minimum 
error probability. If the correlation between X and Y can be modeled as a "virtual" channel described 
as X = Y + W, with W an additive noise process, a good channel code for that transmission problem 
is also expected to be a good S-W source code [3]. 
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Regarding asymmetric S-W coding, the first practical technique has been described in [15], and 
employs trellis codes. Recently, more powerful channel codes such as turbo codes have been proposed 
in [6], [16], [17], and low-density parity-check (LDPC) [18] codes have been used in [19], [20], [21]. 
Turbo and LDPC codes can get extremely close to channel capacity, although they require the block 
size to be rather large. Note that the constituent codes of turbo-codes are convolutional codes, hence 
the syndrome is difficult to compute. In [6] the cosets are formed by all messages that produce 
the same parity bits, even though this approach is somewhat suboptimal [17], since the geometrical 
properties of these cosets are not as good as those of syndrome-based coding. In [22] a syndrome 
former is used to deal with this problem. Multilevel codes have also be addressed; in [23] trellis 
codes are extended to multilevel sources, whereas in [24] a similar approach is proposed for LDPC 
codes. 

Besides techniques based on channel coding, a few authors have also investigated the use of source 
coders for DSC. This is motivated by the fact that existing source coders obviously exhibit nice 
compression features that should be retained in a DSC coder, such as the ability to employ flexible 
and adaptive probability models, and low encoding complexity. In [25] the problem of designing 
a variable-length DSC coder is addressed; it is shown that the problem of designing a zero-error 
such coder is NP-hard. In [26] a similar approach is followed; the authors consider the problem 
of designing Huffman and arithmetic DSC coders for multilevel sources with zero or almost-zero 
error probability. The idea is that, if the joint density of the source and the side information satisfies 
certain conditions, the same codeword (or the same interval for the AC process) can be associated 
to multiple symbols. This approach leads to an encoder with a complex modeling stage (NP-hard 
for the optimal code, though suboptimal polynomial-time algorithms are provided in [26]), while the 
decoding process resembles a classical arithmetic decoder. 

As for symmetric S-W codes, a few techniques have been recently proposed. A symmetric code 
can be obtained from an asymmetric one through time sharing, whereby the two sources alternatively 
take the role of the source and the side information; however, current DSC coders cannot easily 
accommodate this approach. Syndrome-based channel code partitioning has been introduced in [27], 
and extended in [28] to systematic codes. A similar technique is described in [29], encompassing 
non-systematic codes. Syndrome formers have also been proposed for symmetric S-W coding [30]. 
Moreover, techniques based on the use of parity bits can also be employed, as they can typically 
provide rate compatibility. A practical code has been proposed in [16] using two turbo codes that are 
decoded jointly, achieving the equal rate point; in [31] an algorithm is introduced that employs turbo 
codes to achieve arbitrary rate splitting. Symmetric S-W codes based on LDPC codes have also been 
developed [32], [33]. 
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Although several near-optimal DSC coders have been designed for simple ideal sources (e.g., binary 
and Gaussian sources), the applications of practical DSC schemes to realistic signals typically incurs 
the following problems. 

• Channel codes get very close to capacity only for very large data blocks (typically in excess of 
10 5 symbols). In many applications, however, the basic units to be encoded are of the order of 
a few hundreds to a few thousands symbols. For such block lengths, channel codes have good 
but not optimal performance. 

• The symbols contained in a block are expected to follow a stationary statistical distribution. 
However, typical real-world sources are not stationary. This calls for either the use of short 
blocks, which weakens the performance of the S-W coder, or the estimation of conditional 
probabilities over contexts, which cannot be easily accommodated by existing S-W coders. 

• When the sources are strongly correlated (i.e., in the most favorable case), very high-rate channel 
codes are needed (e.g., rate--^ codes). However, capacity-achieving channel codes are often not 
very efficient at high rate. 

• In those applications where DSC is used to limit the encoder complexity, it should be noted that 
the complexity of existing S-W coders is not negligible, and often higher than that of existing 
non-DSC coders. This seriously weakens the benefits of DSC. 

• Upgrading an existing compression algorithm like JPEG 2000 or H.264/AVC to provide DSC 
functionalities requires at least to redesign the entropy coding stage, adopting one of the existing 
DSC schemes. 

Among these issues, the block length is particularly important. While it has been shown that, on 
ideal sources with very large block length, the performance of some practical DSC coders can be 
as close as 0.09 bits to the theoretical limit [14], so far DSC of real-world data has fallen short of 
its expectations, one reason being the necessity to employ much smaller blocks. For example, the 
PRISM video coder [5] encodes each macroblock independently, with a block length of 256 samples. 
For the coder in [6], the block length is equal to the number of 8x8 blocks in one picture (1584 for 
the CIF format). The performance of both coders is rather far from optimal, highlighting the need of 
DSC coders for realistic block lengths. 

A solution to this problem has been introduced in [34], where an extension of AC, named distributed 
arithmetic coding (DAC), has been proposed for asymmetric S-W coding. Moreover, in [35] DAC 
has been extended to the case of symmetric S-W coding of two sources at the same rate (i.e., the 
mid-point of the S-W rate region). DAC and its decoding process do not currently have a rigorous 
mathematical theory that proves they can asymptotically achieve the S-W rate region; such theory is 
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very difficult to develop because of the non-linearity of AC. However, DAC is a practical algorithm 
that was shown in [34] to outperform other existing distributed coders. In this paper, we build on the 
results presented in [34], providing several new contributions. For asymmetric coding, we focus on 
i.i.d. sources as these are often found in many DSC applications; for example, in transform-domain 
distributed video coding, DAC could be applied to the bit-planes of transform coefficients, which can 
be modeled as i.i.d. We optimize the DAC using an improved encoder termination procedure, and 
we investigate the rate allocation problem, i.e., how to optimally select the encoding parameters to 
achieve a desired target rate. We evaluate the performance of this new design comparing it with turbo 
and LDPC codes, including the case of extremely correlated sources with highly skewed probabilities. 
This is of interest in multimedia applications because the most significant bit-planes of the transform 
coefficients of an image or video sequence are almost always equal to zero, and are strongly correlated 
with the side information. For symmetric coding, we extend our previous work in [35] by introducing 
DAC encoding and rate allocation procedures that allow to encode an arbitrary number of sources 
with arbitrary combination of rates. We develop and test the decoder for two sources. 

Finally, it should be noted that an asymmetric DAC scheme has been independently and concurrently 
developed in [36] using quasi-arithmetic codes. Quasi-arithmetic codes are a low-complexity approx- 
imation to arithmetic codes, providing smaller encoding and decoding complexity [37]. These codes 
allow the interval endpoints to be only a finite set of points. While this yields suboptimal compression 
performance, it makes the arithmetic coder a finite state machine, simplifying the decoding process 
with side information. 

This paper is organized as follows. In Sect. Ill we describe the DAC encoding process for the 
asymmetric case, in Sect. Ill we describe the DAC decoder, and in Sect. IV we study the rate 
allocation and parameter selection problem. In Sect. V we describe the DAC encoder, decoder and 
rate allocator for the symmetric case. In Sect. VI and VII we report the DAC performance evaluation 
results in the asymmetric and symmetric case respectively. Finally, in Sect. VIII we draw some 
conclusions. 

II. Distributed arithmetic coding: asymmetric encoder 

Before describing the DAC encoder, it should be noted that the AC process typically consists of a 
modeling stage and a coding stage. The modeling stage has the purpose of computing the parameters 
of a suitable statistical model of the source, in terms of the probability that a given bit takes on 
value or 1. This model can be arbitrarily sophisticated, e.g., by using contexts, adaptive probability 
estimation, and so forth. The coding stage takes the probabilities as input, and implements the actual 
AC procedure, which outputs a binary codeword describing the input sequence. 
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Let X be a binary memory less source that emits a semi-infinite sequence of random variables JQ, 
i = 0, 1, . . ., with probabilities p$ = P(Xi = 0) and pf = P(Xi = 1). We are concerned with 
encoding the sequence x = [x , ■ ■ ■ , #jv-i] consisting in the first N occurrences of this source. The 
modeling and coding stages are shown in Fig. 1-a. The modeling stage takes as input the sequence 
x, and outputs an estimate of the probabilities p* and pf. The coding stage takes as input x, p$ 
and pf, and generates a codeword Cx- The expected length of Cx depends on p* and pf, and is 
determined once these probabilities are given. 

In order to use the DAC, we consider two sources X and Y, where Y is a binary memoryless source 
that emits random variables Yi, i = 0, 1, . . ., with probabilities p$ = P(Yi = 0) and p\ = P(Yi = 1). 
The first ./V occurrences of this source form the side information y = [yo, • • • , yjv-i]- We assume that 
X and Y are i.i.d. sources, and that Xi and Yi are statistically dependent for a given i. The entropy 
of X is defined as H{X) = — Ylj=oPf log 2 pf, and similarly for Y. The conditional entropy of X 
given Y is defined as H(X\Y) = - £j =0 ELo P(*i = 3, *i = k) log 2 P(X t = j\Y t = k). 

For DAC, three blocks can be identified, as in Fig. 1-b, namely the modeling, rate allocation, and 
coding stages. The modeling stage is exactly the same as in the classical AC. The coding stage will 
be described in Sect. II-B; it takes as inputs x, the probabilities Pq and pf, and the parameter k x , 
and outputs a codeword C' x . Unlike a classical AC, where the expected rate is function of the source 
probabilities, and hence cannot be selected a priori, the DAC allows to select any desired rate not 
larger than the expected rate of a classical AC. This is very important, since in a DSC setting the 
rate for x should depend not only on how much "compressible" the source is, but also on how much 
correlated X{ and Yi are. For this reason, in DAC we also have a rate allocation stage that takes as 
input the probabilities p* and pf and the conditional entropy H(X\Y), and outputs a parameter k x 
that drives the DAC coding stage to achieve the desired target rate. 

In this paper we deal with the coding and rate allocation stages, and assume that the input 
probabilities Pq , pf- and conditional entropy H{X\Y) are known a priori. This allows us to focus 
on the distributed coding aspects of the proposed scheme, and, at the same time, keeps the scheme 
independent of the modeling stage. 

A. Arithmetic coding 

We first review the classical AC coding process, as this sets the stage for the description of the DAC 
encoder; an overview can be found in [38]. The binary AC process for x is based on the probabilities 
Pq and pf, which are used to partition the [0, 1) interval into sub-intervals associated to possible 
occurrences of the input symbols. At initialization the "current" interval is set to Iq = [0, 1). For 
each input symbol xi, the current interval Ii is partitioned into two adjacent sub-intervals of lengths 
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Fig. 1. Modeling, rate allocation and coding stage for (a) classical AC, and (b) DAC. 



P(f\Ii\ and Pi\h\, where |7j| is the length of h. The sub-interval corresponding to the actual value 
of Xi is selected as the next current interval and this procedure is repeated for the next symbol. 
After all N symbols have been processed, the sequence is represented by the final interval In- The 
codeword Cx can consist in the binary representation of any number inside In (e.g., the number in 
In with the shortest binary representation), and requires approximately — log 2 |ijv| bits. 

B. DAC encoder 

Similarly to other S-W coders, DAC is based on the principle of inserting some ambiguity in the 
source description during the encoding process. This is obtained using a modified interval subdivision 
strategy. In particular, the DAC employs a set of intervals whose lengths are proportional to the 
modified probabilities p* and p±, such that p* > pff and pf > p* . In order to fit the enlarged 
sub-intervals into the [0, 1) interval, they are allowed to partially overlap. This prevents the decoder 
from discriminating the correct interval, unless the side information is used. 

The detailed DAC encoding procedure is described in the following. At initialization the "current" 
interval is set to I' Q = [0, 1). For each input symbol xi, the current interval I[ is subdivided into 
two partially overlapped sub-intervals whose lengths are K'l an ^ P\\I[\- The interval representing 
symbol Xj is selected as the next current interval I' i+1 . After all N symbols have been processed, 
the sequence is represented by the final interval I' N . The codeword C' x can consist in the binary 
representation of any number inside I' N , and requires approximately — log 2 \I' N \ bits. This procedure 
is sketched in Fig. 2. At the decoder side, whenever the codeword points to an overlapped region, 
the input symbol cannot be detected unambiguously, and additional information must be exploited 
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Fig. 2. Distributed arithmetic encoding procedure for a block of three symbols. 



by the joint decoder to solve the ambiguity. It is worth noticing that the DAC encoding procedure is 
a generalization of AC. Letting p* = p* and pf = pf leads to the AC encoding process described 
in Sect. II- A, with I' N = I N and C' x = C X - 

It should also be noted that, for simplicity, the description of the AC and DAC provided above 
assumes infinite precision arithmetic. The practical implementation used in Sect. VI and VII employs 
fixed-point arithmetic and interval renormalization. 

III. Decoding for the asymmetric case 

The objective of the DAC decoder is joint decoding of the sequence x given the correlated side 
information y. The arithmetic decoding machinery of the DAC decoder presents limited modifications 
with respect to standard arithmetic decoders; a fixed-point implementation has been employed, with 
the same interval scaling and overlapping rules used at the encoder. In the following the arithmetic 
decoder state at the i-th decoding step is denoted as <Ji,i = 0,...,N — 1. The data stored in <7j 
represent the interval I[ and the codeword at iteration i. 

The decoding process can be formulated as a symbol-driven sequential search along a proper 
decoding tree, where each node represents a state cij, and a path in the tree represents a possible 
decoded sequence. The following elementary decoding functions are required to explore the tree: 
• (xj,crj + i) =Test-One-Symbol{(Ji): it computes the sub-intervals at the 2-th step, compares them 
with C'x and outputs either an unambiguous symbol X{ = 0, 1 (if C' x belongs to one of the 
non-overlapped regions), or an ambiguous symbol Xi = A. In case of unambiguous decoding, 
the new decoder state cij + i is returned for the following iterations. 
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• (Tj + i =Force-One-Symbol(ai, Xi): it forces the decoder to select the sub-interval corresponding 
to the symbol xi regardless of the ambiguity; the updated decoder state is returned. 
In Fig. 3 an example of a section of the decoding tree is shown. In this example the decoder is not able 
to make a decision on the i-th symbol, as Test-One-Symbol returns x% = A. As a consequence, two 
alternative decoding attempts are pursued by calling Force-One-Symbol with xi = 0,1 respectively. 
In principle, by iterating this process, the tree T, representing all the possible decoded sequences, can 
be explored. The best decoded sequence can finally be selected applying the Maximum A Posteriori 
(MAP) criterion x = arg maxr P(X , . . . , Xjq-i \C' X , Y). 

In general, exhaustive search cannot be applied due to the exponential growth of T. A viable 
solution is obtained applying the breadth-first sequential search known as M-algorithm [39], [40]; at 
each tree depth, only the M nodes with the best partial metric are retained. This amounts to visiting 
only a subset of the most likely paths in T. The MAP metric for a given node can be evaluated as 
follows: 

i 

P(X =x ,...,Xi = Xi\C' x ,Y) = J] P{Xj = Xj\C' x ,Yj) (1) 

j=0 

Metric (1) can be expressed into additive terms by setting: 

i 

Ai+i 4 \ogP(X = x ,...,X i =x i \C' x ,Y) = J2^ (2) 

3=0 

\ 3 4 XogPiX^XjP^Yj) 

where Ao = and \ represent the additive metric to be associated to each branch of T. 

The pseudocode for the DAC decoder is given in Algorithm 1 , where % represents the list of nodes 
in T explored at depth i; each tree node stores its corresponding arithmetic decoder state Oi and the 
accumulated metric Aj. 

It is worth pointing out that M has to be selected as a trade-off between the memory /complexity 
requirements and the error probability, i.e., the probability that the path corresponding to the original 
sequence x is accidentally dropped. As in the case of standard Viterbi decoding, the path metric 
turns out to be stable and reliable as long as a significant amount of terms, i.e., number of decoded 
symbols Xi, are taken into account. In the pessimistic case when all symbol positions % trigger a 
decoder branching, given M, one can guarantee that at least log 2 (M) symbols are considered for 
metric comparisons and pruning. On the other hand, in practical cases, the interval overlap is only 
partial and branching does not occur at every symbol iteration. All the experimental results presented 
in Sect. VI have been obtained using M = 2048, while the trade-off between performance and 
complexity is analyzed in Sect. VI-F. 
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Algorithm 1 DAC decoder (asymmetric case) 
Initialize % with root node (<7o , Ao = 0) 

Set symbol counter % <= 

while (i < N) do 

for All nodes (cij, Aj) in 7^ do 
(xj,cij_|_i) =Test-One-Symbol{oi) 
if Xi = A then 
for k = (0, 1) do 

=Force-One-Symbol(oi, X{ = k) 
Ai+i 4= Ai + Ai 
Insert (<jj + i,Aj + i) in 7^+i 
end for 
else 

Aj+i <s= Aj + Aj 
Insert (a i+ i,A i+1 ) in 7^ + i 
end if 
end for 

Sort nodes in % + \ according to metric Aj + i 
Keep only the M nodes with best metric in %+i 
end while 

Output x (sequence corresponding to the first node stored in T/v) 



Finally, metric reliability cannot be guaranteed for the very last symbols of a finite-length sequence 
x. For channel codes, e.g., convolutional codes, this issue is tackled by imposing a proper termination 
strategy, e.g., forcing the encoded sequence to end in the first state of the trellis. A similar approach is 
necessary when using DAC. Examples of AC termination strategies are encoding a known termination 
pattern or end-of-block symbol with a certain probability or, in the case of context-based AC, driving 
the AC encoder in a given context. For DAC, we employ a new termination policy that is tailored 
to its particular features. In particular, termination is obtained by encoding the last T symbols of the 
sequence without interval overlap, i.e., using pj = pj, for all symbols xi with % > N — T. As a 
consequence, no nodes in the DAC decoding tree will cause branching in the last T steps, making 
the final metrics more reliable for the selection of the most likely sequence. However, there is a rate 
penalty for the termination symbols. 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 



10 



P(X = i ,...,X i =i l \C' x ,Y) 



P(X a = i a ,...,X l . 1 = x l - 1 \C' x ,Y) 




ambiguous decoding 



Fig. 3. Distributed arithmetic decoding tree for asymmetric S-W coding. 



IV. Rate allocation and choice of the overlap factor 

The length of codeword C' x is determined by the length \I' N \ of the final interval, which in turn 
depends on how much Pq and p± are larger than p$ and pf. As a consequence, in order to select 
the desired rate, it is important to quantitatively determine the dependence of the expected rate on 
the overlap, because this will drive the selection of the desired amount of overlap. Moreover, we 
also need to understand how to split the overlap in order to achieve good decoding performance. 
In the following we derive the expected rate obtained by the DAC as a function of the set of input 
probabilities and the amount of overlap. 

A. Calculation of the rate yielded by DAC 

We are interested in finding the expected rate R (in bps) of the codeword used by the DAC to 
encode the sequence x. This is given by the following formula: 

This can be derived straightforwardly from the property that the codeword generated by an AC has an 
expected length that depends on the size of the final interval, that is, on the product of the probabilities 
pf, and hence on the amount of overlap. The expectation is computed using the true probabilities 

pf. 

We set pj = afpf, where of > 1, so that p^ + pf > 1. This amounts to enlarging each 
interval by an amount proportional to the overlap factors af. The expected rate achieved by the 
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DAC becomes 

3=0 

where rf = — log 2 pf, and 5f = log 2 af. Note that rf represents the rate contribution of symbol 
j yielded by standard AC, while 5f represents the decrease of this contribution, i.e., the average 
number of bits saved in the binary representation of the j-th input symbol. 

B. Design of the overlap factors 

Once a target rate has been selected, the problem arises of selecting af. As an example, a possible 
choice is to take equal overlap factors a x = af = a x . This implies that each interval is enlarged 
by a factor a x that does not depend on the source probability pf. This leads to a target rate 

R' x = H(X)-log 2 a x . (4) 

It can be shown that this choice minimizes the rate R for a given total amount of overlap a x p x + 
a x p x — 1; the computations are simple and are omitted for brevity. This choice is not necessarily 
optimal in terms of the decoder error probability. However, optimizing for the error probability is 
impractical because of the nonlinearity of the arithmetic coding process. 

In practice, one also has to make sure that the enlarged intervals [0, a x p x ) and [l-a x p x ,l) 
are both contained inside the [0, 1) interval. E.g., taking equal overlap factors as above does not 
guarantee this. We have devised the following rule that allows to achieve any desired rate satisfying 
the constraint above. We apply the following constraint: 

5 X 

Zx= kX (5) 

3 

with k x a positive constant independent of j. This leads to 

af = (pf)~ kX (6) 

This can be interpreted as an additional constraint that the rate reduction for symbols "0" and "1" 
depends on their probabilities, i.e., the least probable symbol undergoes a larger reduction. Using (6), 
it can be easily shown that the expected rate achieved by the DAC can be written as 

R=(l- k x ^j H(X). (7) 

Thus, the allocation problem for an i.i.d. source is very simple. We assume that the conditional 
entropy H(X\Y) is available as in Fig. 1-b, modeling the correlation between X and Y. In asymmetric 
DSC, x should be ideally coded at a rate arbitrarily close to H(X\Y). In practice, due to the 
suboptimality of any practical coder, some margin p, > 1 should be taken. Hence, we assume that 
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the allocation problem can be written as [1 — k x j H(X) < fiH(X\Y). Since fi is a constant and 
H(X\Y) and H(X) are given, one can solve for k x and then perform the encoding process. 

Finally, it should be noted that, while we have assumed that X and Y are i.i.d., the DAC concept 
can be easily extended to a nonstationary source. This simply requires to consider all probabilities 
and overlap factors as depending on index i; all computations, including the design of the overlap 
factors and the derivation of the target rate, can be extended straightforwardly. A possible application 
is represented by context-based coding or Markov modeling of correlated sources. There is one caveat 
though, in that, if the probabilities and context of each symbol are computed by the decoder from 
past symbols, decoding errors can generate significant error propagation. 

V. Distributed arithmetic coding: the symmetric case 
A. Symmetric DAC encoding and rate allocation 

In many applications, it is preferable to encode the correlated sources at similar rather than 
unbalanced rates; in this case, symmetric S-W coding can be used. Considering a pair of sources, in 
symmetric S-W coding both X and Y are encoded using separate DACs. We denote as C' x and C' Y 
the codewords representing X and Y, and R' x and R' Y the respective rates. With DAC, the rate of 
X and Y can be adjusted with a proper selection of the parameters k x and k Y for the two DAC 
encoders. However, it should be noted that, for the same total rate, not all possible choices of k x and 
k Y are equally good, because some of them could complicate the decoder design, or be suboptimal 
in terms of error probability. To highlight the potential problems of a straightforward extension of 
the asymmetric DAC, let us assume that k x and k Y can be chosen arbitrarily. This would require 
a decoder that performs a search in a symbol-synchronous tree where each node represents two 
sequential decoder states (a x ,aY) for X and Y respectively. If the interval selection is ambiguous 
for both sequences, the four possible binary symbol pairs (00,01,10,11) need to be included in the 
search space; this would accelerate the exponential growth of the tree, and quickly make the decoder 
search unfeasible. This example shows that some constraints need to be put on k x and k Y in order 
to limit the growth rate of the search space. 

To overcome this problem, we propose an algorithm that applies the idea of time-sharing to the 
DAC. The concept of time-shared DAC has been preliminarly presented in [35] for a pair of sources in 
the subcase R' x = R' Y , i.e. providing only the mid-point of the S-W rate region. In the following we 
extend this to an arbitrary combination of rates, and show how this can be generalized to an arbitrary 
number of sources. For two sources, the idea is to divide the set of input indexes % = 0, 1, . . . , N — 1 
in two disjoint sets such that, at each index i, ambiguity is introduced in at most one out of the two 
sources. In particular, for sequences x and y of length N, let Ax and Ay be the subsets of even and 
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odd integer numbers in {0, . . . , N — 1} respectively. We employ a DAC on x and y, but the choice 
of parameters k x and k Y differs. In particular, we let the parameters depend on the symbol index 
i, i.e., k x and kj . The DAC of x employs parameter kf = k x > for all % G Ax, and kf = 
otherwise. Vice versa, y is encoded with parameter kj = k Y > for all i G Ay, and kj = 
otherwise. As a consequence of these constraints, at each step of the decoding process, ambiguity 
appears in at most one out the two sequences. In this way, the growth rate of the decoding tree 
remains manageable, as no more than two new states are generated at each transition, exactly as 
in the asymmetric DAC decoder; this also makes the MAP metric simpler. The conceptual relation 
with time-sharing is evident. Since, during the DAC encoding process, for each input symbol the 
ambiguity is introduced in at most one out the two encoders, this corresponds to switching the role 
of side information between either source on a symbol-by-symbol basis. 

By varying the parameters k x and k Y , all combinations of rates can be achieved. The achieved 
rates can be derived repeating the same computations described in Sect. IV, and can be expressed as 
R' x = (l — ^-^j H{X) and R Y = (l — H{Y). The rate allocation problem amounts to selecting 
suitable rates R' x and R' Y such that R' x > H(X\Y), R' Y > H(Y\X), and R' x + R' Y > H{X, Y). In 
practice one will typically take some margin fi > 1, such that R' x + R' Y = fiH(X, Y); for safety, a 
margin should also be taken on R' x and R Y with respect to the conditional entropy. Since the prior 
probabilities of X and Y are given, one can solve for k x and k Y , and then perform the encoding 
process. Thus, the whole S-W rate region can be swept. 

B. Decoding process for symmetric DAC 

Similarly to the asymmetric case, the symmetric decoding process can be viewed as a search along 
a tree; however, specifically for the case of two correlated sources, each node in tree represents the 
decoding states (af,a Y ) of two sequential arithmetic decoders for x and y respectively. At each 
iteration, sequential decoding is run from both states. The time-sharing approach guarantees that, for 
a given index i, the ambiguity can be found only in one of the two decoders. Therefore, at most two 
branches must be considered, and the tree can be constructed using the same functions introduced in 
Sect. Ill for the asymmetric case. This would be the same also for P sources. In particular, for i G Ax, 
Test-One-Symbol(a Y ) yields an unambiguous symbol y« ^ A, whereas ambiguity can be found only 
while attempting decoding for x with Test-One-Symbol(a Y ). In conclusion, from the node [of ,a Y ) 
the function Test-One-Symbol is used on both states. If ambiguity is found on Xi, Force-One-Symbol 
is then used to explore the two alternative paths for xi, whereas yi is used as side information for 
branch metric evaluation. In the case that i G Ay, the roles of x and y are exchanged. Therefore, 
Algorithm 1 can be easily extended to the symmetric case by alternatively probing either x or y for 
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ambiguity, and possibly generating a branching. The joint probability distribution can be written as 

P(Xq = x ,.. .,X N -i = x N -i,Y = yo,.. . , Yjv-i = yN-i\C' x , C' Y ) = (8) 
= [] P(X i = x i \Y i ,C' x ,C^) J] P(Y i = y i \X i ,C' x ,C$r) 

The symmetric encoder and decoder can be easily generalized to an arbitrary number P of sources. 
The idea is to identify P subsets of input indexes % = 0, 1, . . . , N— 1 such that, at each symbol index i, 
ambiguity is introduced in at most one out of the P sources. In particular, for sequences . . . ,3^^ 
of length N, let A\ , ... , Ap be disjoint subsets of {0, 1, . . . , N — 1}. We denote the DAC parameters 
as k^\ . . . , k\ P \ The DAC of x^) employs parameter k\^ = k^ > for all i £ Aj, and k\^ = 
otherwise. As a consequence of these constraints, at each step of the decoding process, ambiguity 
appears in at most one out the P sequences. Note that this formulation also encompasses the case 
that one or more sources are independent of each other and from all the others; these sources can be 
coded with a classical AC, taking Aj = for this source. 

The selection of the sets Aj and the overlap factors k^\ for j = 1,...,P, is still somewhat 
arbitrary, as the expected rate of source j depends on both the cardinality of Aj and the value of k^\ 
In a realistic application it would be more practical to fix the sets Aj once and for all, and to modify 
the parameters k^ so as to obtain the desired rate. This is because, for time-varying correlations, 
one has to update the rate on-the-fly. In a distributed setting, varying one parameter k^ requires to 
communicate the change only to source j, while varying the sets Aj requires to communicate the 
change to all sources. Therefore, we define Aj such that the P statistically dependent sources take in 
turns the role of the side information. Any additional independent sources are coded separately using 
Aj = 0. In particular, we set Aj = {k\k%P = j}, where % denotes the remainder of the division 
between two integers, and 0%j = 0. The DAC encoder for the j-th source inserts ambiguity only 
at time instants i G Aj. At each node, the decoder stores the states of the P arithmetic decoders, 
and possibly performs a branching if the codeword related to the only potentially ambiguous symbol 
at the current time i is actually ambiguous. Although this encoding and decoding structure is not 
necessarily optimal, it does lead to a viable decoding strategy. 

VI. Results: asymmetric coding 

In the following we provide results of a performance evaluation carried out on DAC. We implement 
a communication system that employs a DAC and a joint decoder, with no feed-back channel; at the 
decoder, pruning is performed using the M-algorithm [39], with M=2048. The side information is 
obtained by sending the source X through a binary symmetric channel with transition probability p, 
which measures the correlation between the two sources. We simulate a source with both balanced 
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(po = 0.5) and skewed (po > 0.5) symbol probabilities. The first setting implies H(X) = H{Y) = 1 
and H(X, Y) = 1 + H(X\Y), where H(X\Y) depends on p. The closer p to 0.5, the less correlated 
the sources, and hence the higher H(X\Y). In the skewed case, given po, H(X) is fixed, whereas 
both H(Y) and H(X\Y) depend on p. Unless otherwise specified, each point of the figures/tables 
presented in the following has been generated averaging the results obtained encoding 10 7 samples. 

A. Effect of termination 

As a first experiment, the benefit of the termination policy is assessed. An i.i.d. stationary source 
X emits sequences x of N = 200 symbols, with po = 0.5 and H(X\Y) = 0.25, which are encoded 
with DAC at fixed rate 0.5 bps, i.e., 0.25 bps higher than the theoretical S-W bound. For Y we 
assume ideal lossless encoding at average rate H{Y) = 1 bps, so that the total average rate of X 
and Y is 1.5 bps. The bit error rate (BER) yielded by the decoder is measured for increasing values 
of the number of termination symbols T. The same simulation is performed with N = 1000. In all 
simulated cases, the DAC overlap has been selected to compensate for the rate penalty incurred by 
the termination, so as to achieve the 1.5 bps overall target rate. The overlap factors aj are selected 
according to (6). 

The results are shown in Fig. 4; it can be seen that the proposed termination is effective at reducing 
the BER. There is a trade-off in that, for a given rate, increasing T reduces the effect of errors in the 
last symbols, but requires to overlap the intervals more. It is also interesting to consider the position 
of the first decoding error as, without termination, errors tend to cluster at the end of the block. For 
N = 200, the mean position value is 191, 178, 168, 161 and 95, with standard deviation 13, 18, 25, 
36 and 49, respectively for T equal to 0, 5, 10, 15 and 20. For N = 1000, the mean value is 987, 
954, 881, 637 and 536, with standard deviation 57, 124, 229, 308 and 299. The optimal values of 
T are around 15-20 symbols. Therefore, we have selected T = 15 and used this value for all the 
experiments reported in the following. 

B. Effect of the overlap design rule 

Next, an experiment has been performed to validate the theoretical analysis of the effects of different 
overlap designs shown in Sect. IV-B. In Fig. 5 the performance obtained by using the design of 
equations (4) and (6) respectively is shown. The experimental settings are N = 200, pq = 0.8, fixed 
rate for x of 0.5 bps, and total average rate for X and Y equal to 1.5 bps, with ideal lossless encoding 
of Y at rate H(Y). The BER is reported as a function of the source correlation expressed in terms 
of H(X,Y). It is worth noticing that the performance yielded by different overlap design rules are 
almost equivalent. Note that the rule in (6) consistently outperforms that in (4), confirming that this 
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Fig. 4. BER as function of T (number of termination symbols); po = 0.5, total rate = 1.5 bps, rate of x = 0.5 bps, 
H{X\Y) = 0.25. 



latter is only optimal for the rate. There is some difference when H(X, Y) is very high (i.e., for 
weakly correlated sources). However, this case is of marginal interest since the performance is poor 
(the BER is of the order of 0.1). 

C. Performance evaluation at fixed rate 

The performance of the proposed system is compared with that of a system where the DAC encoder 
and decoder are replaced by a punctured turbo code similar to that in [6]. We use turbo codes with rate- 
\ generator (17,15) octal (8 states) and (31,27) octal (16 states), and employ S-random interleavers, 
and 15 decoder iterations. We consider the case of balanced source (po = Pi = 0.5) and skewed 
source (in particular po = 0.9 and po = 0.8). For a skewed source, as an improvement with respect 
to [6], the turbo decoder has been modified by adding to the decoder metric the a priori term, as 
done in [16]. Block sizes N = 50, N = 200 and N = 1000 have been considered (with S-random 
interleaver spread of 5, 11 and 25 respectively); this allows to assess the DAC performance at small 
and medium block lengths. Besides turbo codes, we also considered the rate-compatible LDPC codes 
proposed in [21]. For these codes, a software implementation is publicly available on the web; among 
the available pre-designed codes, we used the matrix for N = 396, which is comparable with the 
block lengths considered for the DAC and the turbo code. 

The results are worked out in a fixed-rate coding setting as in [14], i.e., the rate is the same for each 
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Fig. 5. Performance comparison between the use of different overlap rules (po = 0.8, total rate = 1.5 bps). 



sample realization of the source. Fig. 6 reports the results for the balanced source case; the abscissa 
is H(X,Y), and is related to p. The performance is measured in terms of the residual BER after 
decoding, which is akin to the distortion in the Wyner-Ziv binary coding problem with Hamming 
metric. Both the DAC and the turbo code generate a description of x at fixed rate 0.5 bps; the total 
average rate of X and Y is 1.5 bps, with ideal lossless encoding of Y at rate H(Y). Since H(Y) = 1, 
we also have that H(X, Y) = 1 + H(X\Y). This makes it possible to compare these results with the 
case of skewed sources which is presented later in this section, so as to verify that the performance 
is uniformly good for all distributions. The Wyner-Ziv bound for a doubly symmetric binary source 
with Hamming metric is also reported for comparison. 

As can be seen, the performance of DAC slightly improves as the block length increases. This is 
mostly due to the effect of the termination. As the number of bits used to terminate the encoder is 
chosen independently of the block length, the rate penalty for non overlapping the last bits weights 
more when the block length is small, while the effect vanishes for large block length. In [34], where 
the termination effect is not considered, the performance is shown to be almost independent of 
the block size. It should also be noted that the value of M required for near-optimal performance 
grows exponentially with the block size. As a consequence, the memory which leads to near-optimal 
performance for N = 50 or N = 200 limits the performance for N = 1000. 
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We compared both 8-states and 16-states turbo codes. The 8-states code is often used in practical 
applications, as it exhibits a good trade-off between performance and complexity; the 16-states code 
is more powerful, and requires more computations. It can be seen that, for block length N = 50 and 
N = 200, the proposed system outperforms the 8-states and 16-states turbo codes. For block length 
N = 1000, the DAC performs better than the 8-states turbo code, and is equivalent to the 16-states 
code. It should be noted that, in this experiment, only the "channel coding performance" of the DAC 
is tested, since for the balanced source no compression is possible as H(X) = 1. Consequently, it is 
remarkable that the DAC turns out to be generally more powerful than the turbo code at equal block 
length. Note that the performance of the 16-states code is limited by the error floor, and could be 
improved using an ad-hoc design of the code or the interleaver; the DAC has no error floor, but its 
waterfall is less steep. For H(X\Y) > 0.35, a result not reported in Fig. 6 shows that the DAC with 
N = 200 and N = 1000 also outperform the 8-state turbo-coder with N = 5000. In Fig. 6 and in the 
following, it can be seen that turbo codes do not show the typical cliff-effect. This is due to the fact 
that, at the block lengths considered in this paper, the turbo code is still very far from the capacity; 
its performance improves for larger block lengths, where the cliff-effect can be seen. In terms of the 
rate penalty, setting a residual BER threshold of 10~ 4 , for N = 200 the DAC is almost 0.3 bps away 
from the S-W limit, while the best 16-state turbo code simulated in this paper is 0.35 bps away; 
for N = 1000 the DAC is 0.26 bpp away, while the best 8-state turbo code is 0.30 bps away. The 
performance of the LDPC code for TV = 396 is halfway between the turbo codes for N = 200 and 
N = 1000, and hence very similar to the DAC. 

The results for a skewed source are reported in Fig. 7 for po = 0.8. In this setting, we select various 
values of H(X, Y), and encode x at fixed rate such that the total average rate for X and Y equals 
1.5 bps, with ideal lossless encoding of Y at rate H(Y). For Fig. 7, from left to right, the rates of 
x are respectively 0.68, 0.67, 0.66, 0.64, 0.63, 0.61, 0.59, and 0.58 bps. Consistently with [30], all 
turbo codes considered in this work perform rather poorly on skewed sources. In [30] this behavior 
is explained with the fact that, when the source is skewed, the states of the turbo code are used with 
uneven probability, leading to a smaller equivalent number of states. On the other hand, the DAC has 
good performance also for skewed sources, as it is designed to work with unbalanced distributions. 
The performance of the LDPC codes is similar to that of the best turbo codes, and slightly worse 
than the DAC. 

Similar remarks can be made in the case of po = 0.9, which is reported in Fig. 8. In this case, we 
have selected a total rate of 1 bps, since the source is more unbalanced and hence easier to compress. 
The rates for x are respectively 0.31, 0.34, 0.37, 0.39, 0.42, 0.44, and 0.47 bps. In this case the turbo 
code performance is better than in the previous case, although it is still poorer than DAC. This is due 
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Fig. 6. Performance comparison of data communication systems (po = 0.5, total rate =1.5 bps, rate for x = 0.5 bps): 
DAC versus turbo coding, balanced source. DAC: distributed arithmetic coding; TC8S and TC16S: 8- and 16-state turbo 
code with S-random interleaver; LDPC-R and LDPC-I: regular and irregular LDPC codes from [21]. 



to the fact that the sources are more correlated, and hence the crossover probability on the virtual 
channel is lower. Therefore, the turbo code has to correct a smaller number of errors, whereas for 
Po = 0.8 the correlation was weaker and hence the crossover probability was higher. 

D. Performance evaluation for strongly correlated sources 

We also considered the case of strongly correlated sources, for which high-rate channel codes are 
needed. These sources are a good model for the most significant bit-planes of several multimedia 
signals. Due to the inefficiency of syndrome-based coders, practical schemes often assume that no 
DSC is carried out on those bit-planes, e.g., they are not transmitted, and at the decoder they are 
directly replaced by the side information [9]. 

The results are reported in Tab. I for the DAC and the 16-state turbo code, when a rate of 0.1 
bps is used for x. The table also reports the cross-over probability p, corresponding, for a balanced 
source, to the performance of an uncoded system that reconstructs x as the side information y. As 
can be seen, the DAC has similar performance to the turbo codes and LDPC codes, and becomes 
better when the source is extremely correlated, i.e., H(X\Y) = 0.001. 
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Fig. 7. Performance comparison of data communication systems (p = 0.8, total rate = 1.5 bps): DAC versus turbo 
coding, skewed source. DAC: distributed arithmetic coding; TC8S and TC16S: 8- and 16-state turbo code with S-random 
interleaver; LDPC-R and LDPC-I: regular and irregular LDPC codes from [21]. 




Fig. 8. Performance comparison of data communication systems (po — 0.9, total rate = 1 bps): DAC versus turbo coding, 
skewed source. DAC: distributed arithmetic coding; TC8S and TC16S: 8- and 16-state turbo code with S-random interleaver; 
LDPC-R and LDPC-I: regular and irregular LDPC codes from [21]. 



E. Performance evaluation at variable rate 

Finally, the coding efficiency of DAC is measured in terms of expected rate required to achieve 
error-free decoding. This amounts to re-encoding the sequence at increasing rates, and represents 
the optimal DAC performance if the encoder could exactly predict the decoder behavior. Since each 
realization of the source is encoded using a different number of bits, this case is referred to as 
variable -rate encoding. This scenario is representative of practical distributed compression settings, 
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TABLE I 

Residual BER in case of strongly correlated sources, with p = 0.5 and rate for x equal to 0. 1 bps. 



N = 200 


H(X\Y) 


p 


DAC 


TC16S 


0.1 
0.01 
0.001 


1.3 • 10~ 2 
8.6- 10~ 4 
6.5 • 10~ 5 


2.25 • 10~ 2 
2.55 • 10~ 4 
1.5 ■ 10~ 6 


1.05 • 10~ 2 
1.74 ■ 10~ 4 
7.0 • 10~ 6 


N = 1000 


H(X\Y) 


V 


DAC 


TC16S 


0.1 
0.01 
0.001 


1.3 • 10~ 2 
8.6- 10~ 4 
6.5 ■ 10" 5 


2.10 • 10~ 2 
1.5 ■ 10~ 5 
< 1 ■ 10~ 6 


1.18 • 10~ 2 
2.9 • 10 -6 
1.0 • 10~ 6 


N = 396 


H{X\Y) 


V 


LDPC-R 


LDPC-I 


0.1 
0.01 
0.001 


1.3 • 10~ 2 
8.6- 10~ 4 
6.5 ■ 10~ 5 


1.20 • 10~ 2 
1.18 • 10~ 4 
4.65 • 10~ 6 


1.11 • 10~ 2 
1.01 ■ 10~ 4 
7.58 • 10~ 6 



e.g., [6], in which one seeks the shortest code that allows to reconstruct without errors each realization 
of the source process. 

For this simulation, the following setup is used. The source correlation H(X\Y) is kept constant 
and, for each sample realization of the source, the total rate is progressively increased beyond the 
S-W bound, in steps of 0.01 bps, until error-free decoding is obtained. This operation is repeated on 
1000 different realizations of the source; the mean value and standard deviation of the rates yielding 
correct decoding are then computed. 

The results have been worked out for block length N = 200, with probabilities po = 0.5 and 
Po = 0.9. For po = 0.5, the conditional entropy H(X\Y) (i.e., the S-W bound) has been set to 0.5 
bps. For pq = 0.9, the joint entropy H(X, Y) has been set to 1 bps; this amounts to coding Y at the 
ideal rate of H(Y) ~ 0.715 bps, with a S-W bound H{X\Y) ~ 0.285 bps. 

The results are reported in Tab. II. As can be seen, the DAC has a rate loss of about 0.06 bps 
with respect to the S-W bound for both the symmetric and skewed source. The turbo code exhibits 
a loss of about 0.2 bps and 0.13 bps. The LDPC-R code has a relatively small loss, i.e., 0.12 bps 
in the symmetric case and 0.10 in the skewed one. The LDPC-I code has a slightly smaller loss, 
i.e., 0.09 bps in the symmetric case and 0.075 in the skewed one. However, the DAC still performs 
slightly better. It should be noted that, while for LDPC and turbo codes the encoding is done only 
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once thanks to rate-compatibility, for the DAC multiple encodings are necessary, leading to higher 
complexity. 

TABLE II 

Performance comparison for variable-rate coding: mean and standard deviation of rate needed for 

lossless compression. 





Po = 


= 0.5 


Po ~- 


= 0.9 


H{X\Y), H(X,Y) 


0.50, 1.50 


0.285, 1.0 




mean 


st.dev. 


mean 


st.dev. 


DAC N = 200 


0.56 


0.04 


0.32 


0.03 


LDPC-R N = 396 


0.62 


0.06 


0.37 


0.05 


LDPC-I N = 396 


0.59 


0.06 


0.35 


0.05 


TC16S N = 200 


0.71 


0.11 


0.42 


0.08 


TC16S N = 1000 


0.70 


0.05 


0.41 


0.04 



F. Performance versus complexity 

As has been said, the DAC performance is a function of the block size and especially of the 
decoder parameter M. Tab. Ill reports comparative decoding results of DAC, turbo and LDPC codes 
for various values of M and N. The simulations have been made under the same conditions of Fig. 
6, i.e. po = 0.5, total average rate equal to 1.5 bps, and fixed rate of x equal to 0.5 bps, considering 
the case of H(X\Y) = 0.25. Tab. Ill reports the residual BER, and the running time in milliseconds, 
obtained running the different decoders on a workstation with Pentium IV 3 GHz processor running 
Windows XP. 

As can be seen, the DAC complexity grows exponentially with M. Increasing M typically improves 
performance, and the improvement is larger as N increases. Comparing DAC and turbo codes at 
approximately equal computation time, it can be seen that, for N = 50 and N = 200, the DAC 
performance is significantly better, while the turbo code outperforms DAC for N = 1000. For LDPC 
codes, the results for N = 396 can be compared with the DAC for N = 200. It can be seen that, 
with similar computation time, DAC and LDPC codes have similar performance. The BER yielded 
by the LDPC code is four times smaller than that of DAC, although it would increase going from 
N = 396 to N = 200. 

VII. Results: symmetric coding 

In the following we provide results for the symmetric DAC. We consider two sources with balanced 
(po = 0.5) and unbalanced (po = 0.9) distribution with arbitrary rate splitting, and use M = 2048. 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (RESUBMITTED NOVEMBER 2008) 23 

TABLE III 

Decoder complexity and performance for DAC, turbo codes and LDPC codes. 



Algorithm 


Parameter 


BER 


Time (ms) 


DAC TV = 


50 


M = 64 


1.20 


10 


-2 


2.26 


DAC TV = 


50 


M = 256 


4.89 


10 


-3 


9.64 


DAC TV = 


50 


M = 512 


3.49 


10 


-3 


22.78 


DAC TV = 


50 


71 /f i no A 

M = 1024 


2.93 


10 


-3 


70.72 


DAC TV = 


50 


ji if on a o 
M = 2048 


2.61 


10 


-3 


2s4.1o 


TC16S TV 


= 50 


15 iterations 


8.60 


10 


-3 


n qa 
9.JU 


DAC TV = 


200 


M = 64 


3.15 


10 


-3 


9.77 


DAC TV = 


200 


M = 256 


8.53 


10 


-4 


44.96 


DAC TV = 


200 


TVf = 512 


4.55 


10 


-4 


1 19.94 


DAC TV = 


200 


71 /f i no A 

M = 1U24 


2.80 


10 


-4 


394.33 


DAC TV = 


200 


M = 2048 


2.00 


10 


-4 


1538.43 


TC16S TV 


= 200 


15 iterations 


O *7 A 
2.(4: 


10 


-3 


30.3 / 


DAC TV = 


1000 


7\/f RA 

IVl = D4 


5.36 


10 


-3 




DAC TV = 


1000 


M = 256 


1.06 


10 


-3 


251.32 


DAC TV = 


1000 


M = 512 


5.25 


10 


-4 


766.80 


DAC TV = 


1000 


M = 1024 


2.84 


10 


-4 


2864.06 


DAC TV = 


1000 


M = 2048 


1.71 


10 


-4 


11545.94 


TC16S TV 


= 1000 


15 iterations 


1.2 • 


10" 


5 


188.11 


LDPC-R TV = 396 


100 iterations 


2.27 


10 


-4 


16.95 


LDPC-I TV 


= 396 


100 iterations 


2.14 


10 


-4 


20.18 



A. Performance evaluation at fixed rate 

For fixed rate, we set the total rate of x and y equal to 1.5 bps. We consider two cases of rate 
splitting. In the first case the rate is equally split; we choose k x = k Y so as to achieve a rate of 0.75 
bps for each source. In the second case we encode x at 0.6 bps and y at 0.9 bps. 

The performance of the symmetric DAC is worked out for N = 200 and TV = 1000. Since 
symmetric DSC coders typically reconstructs each sequence either without any errors or with a large 
number of errors [28], we report the frame error rate (FER) instead of the residual BER, i.e. the 
probability that a data block contains at least one error after joint decoding. For each point, we 
simulated at least 10 7 bits. 

Fig. 9 shows the results for the symmetric DAC. Comparisons with other algorithms can be done 
based on the following remarks. In [31], a symmetric S-W coder is proposed employing turbo codes, 
which can obtain any rate splitting. In the case that one source is encoded without ambiguity, this 
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reduces to the asymmetric turbo-based S-W coder we have employed in Sect. VI. In [31] it is reported 
that this algorithm achieves its best performance in the asymmetric points of the S-W region, while 
it is slightly poorer in the intermediate points. Therefore, in Fig. 9 we report the FER corresponding 
to the best turbo code shown in Fig. 6 for N = 200 and N = 1000, as this lower-bounds the FER 
achieved by [31] over the entire S-W region. Moreover, we also report the FER achieved by irregular 
LDPC codes with block length N = 396 [21]. The asymmetric algorithm in [21] has been extended 
in [33] to arbitrary rate splitting, showing that the performance is uniformly good over the entire S-W 
region. Finally, we also report the FER curve of the asymmetric DAC for N = 1000. 



Fig. 9. Performance comparison of data communication systems (po = 0.5, total rate = 1.5 bps). DAC: distributed 
arithmetic coding; TC16S: 16-state turbo code with S-random interleaver; LDPC-I: irregular LDPC codes from [21]. 

In Fig. 9, the results for symmetric coding are very similar to what has been observed in the 
asymmetric case. The DAC achieves very similar BER for N = 200 and N = 1000; hence, the FER 
is smaller for N = 200. The results are almost independent of the rate splitting between x and y, 
as can be seen by comparing the two rate-splitting cases as well as the asymmetric DAC. The turbo 
codes for N = 200 and N = 1000, and the irregular LDPC code, exhibit poorer performance than 



B. Performance evaluation at variable rate 

For variable rate coding, we consider the same two settings as in Sect. VI-E, i.e., block length 
N = 200, with probabilities po = 0.5 and po = 0.9; in the first case the conditional entropy has been 
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DAC. 
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set to 0.5 bps, while in the second case the joint entropy H(X, Y) has been set to 1 bps. The results 
are shown in Fig. 10. As can be seen, the performance of the symmetric DAC is uniformly good over 
the entire S-W region, and is significantly better than turbo codes and LDPC codes. In particular, 
the DAC suboptimality is between 0.03-0.06 bps, as opposed to 0.07-0.09 for the irregular LDPC 
code, and 0.14-0.21 for the turbo code. It should be noted, however, that variable rate coding requires 
feedback, while the S-W bound is achievable with no feedback, with vanishing error probability as 
N — > oo. In our simulations we re-encode the sequence at increasing rates (in steps of 0.01 bps), 
which represents the optimal DAC performance if the encoder could exactly predict the decoder 
behavior. 




Fig. 10. Performance comparison at variable rate. The curves in the top-right corner refer to the case of po — 0.5, and 
those in the bottom-left corner to po — 0.9. DAC: distributed arithmetic coding; TC16S: 16-state turbo code with S-random 
interleaver; LDPC-I: irregular LDPC codes from [21]. The solid curves represent the S-W bound. 



VIII. Discussion and conclusions 

We have proposed DAC as an alternative to existing DSC coders based on channel codes. DAC 
can operate in the entire S-W region, providing both asymmetric and symmetric coding. 

DAC achieves good compression performance, with uniformly good results over the S-W rate 
region; in particular, its performance is comparable with or better than that of turbo and LDPC 
codes at small and medium block lengths. This is very important in many applications, e.g., in the 
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multimedia field, where the encoder partitions the compressed file into small units (e.g., packets in 
JPEG 2000, slices and NALUs in H.264/AVC) that have to be coded independently. 

As for encoding complexity, which is of great interest for DSC, DAC has linear encoding com- 
plexity, like a classical AC [41]. Turbo codes and the LDPC codes in [21] also have linear encoding 
complexity, whereas general LDPC codes typically have more than linear, and typically quadratic 
complexity [42]. As a consequence, the complexity of DAC is suitable for DSC applications. 

A major advantage of DAC lies in the fact that it can exploit statistical prior knowledge about 
the source very easily. This is a strong asset of AC, which is retained by DAC. Probabilities can be 
estimated on-the-fiy based on past symbols; context-based models employing conditional probabilities 
can also be used, as well as other models providing the required probabilities. These models allow to 
account for the nonstationarity of typical real-world signals, which is a significant advantage over DSC 
coders based on channel codes. In fact, for channel codes, accounting for time-varying correlations 
requires to adjust the code rate, which can only be done for the next data block, incurring a significant 
adaptation delay. Moreover, with channel codes it is not easy to take advantage of prior information; 
for turbo codes it has been shown to be possible [43], employing a more sophisticated decoder. 

Another advantage of the proposed DAC lies in the fact that the encoding process can be seen as 
a simple extension of the AC process. As a consequence, it is straightforward to extend an existing 
scheme employing AC as final entropy coding stage in order to provide DSC functionalities. 
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